Reinforcement Learning in POMDPs: Instance-Based State Identification vs. Fixed Memory Representations
نویسنده
چکیده
This paper explores an instance-based state identification technique called nearest sequence memory presented by McCallum (1994). The algorithm uses a basic k-nearest neighbor approach to solving the problem of hidden state in a reinforcement learning problem. We compare this algorithm with a more commonly used fixed memory representation, history windows.
منابع مشابه
Hidden state and reinforcement learning with instance-based state identification
Real robots with real sensors are not omniscient. When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, bounded field of view and limited attention, we say the robot suffers from the hidden state problem. State identification techniques use history information to uncover hidden state. Some previous ap...
متن کاملFree-energy-based reinforcement learning in a partially observable environment
Free-energy-based reinforcement learning (FERL) can handle Markov decision processes (MDPs) with high-dimensional state spaces by approximating the state-action value function with the negative equilibrium free energy of a restricted Boltzmann machine (RBM). In this study, we extend the FERL framework to handle partially observable MDPs (POMDPs) by incorporating a recurrent neural network that ...
متن کاملReinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make ...
متن کاملMetric learning for reinforcement learning agents
A key component of any reinforcement learning algorithm is the underlying representation used by the agent. While reinforcement learning (RL) agents have typically relied on hand-coded state representations, there has been a growing interest in learning this representation. While inputs to an agent are typically fixed (i.e., state variables represent sensors on a robot), it is desirable to auto...
متن کاملReinforcement Learning by Policy Search
One objective of artiicial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and aaected by its actions; such processes are known as partially observable Markov decision processes (pomdps). While the environment's dynamics are assumed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003